Introduction: Castleman disease (CD) is a rare lymphoproliferative disorder with complex pathological features, leading to challenges in clinical diagnosis. Current diagnostic methods rely on subjective assessments by pathologists, which suffer from significant observer variability and lengthy turnaround times. Although deep learning holds promise in digital pathology, its application is limited by the imbalanced data of CD.

Methods: This study proposes the MC-MIL Net, a multi-cascaded architecture integrating quality-aware preprocessing and context-enhanced representation learning to address data imbalance through data decomposition and reorganization. The framework employs a four-stage cascaded analysis pipeline: First, morphological operations and HSV color space decomposition assess tissue integrity and staining quality. Second, a sample clustering and reorganization strategy adjusts label ratios. Third, ResNet-34/50 is used for feature extraction. Finally, a dual-path decision-making mechanism combines a macro-stream (similarity-weighted fusion) and a micro-stream (attention-based screening). Three-stage transfer learning is adopted to enhance generalization.

Results: Validation across two medical centers demonstrated an average accuracy of 94.55% in five-fold cross-validation at the primary center, with the best AUC reaching 1.00. In cross-center validation, direct transfer learning achieved an accuracy of 81.08%, while fine-tuning with 15% of the data yielded an AUC of 1.00. Full fine-tuning with 85% of the data further improved accuracy to 90.91%.Conclusions: This study enhances diagnostic performance through quality-aware preprocessing and data-balancing strategies, demonstrating that fine-tuning with minimal data can achieve excellent results. It provides a novel approach to rare disease diagnosis. Future work will extend this framework to CD subtype classification and treatment prediction.

This content is only available as a PDF.
Sign in via your Institution